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, Abstract 

, In this work, a new Bayesian framework for OFDM channel estimation is proposed. Using Jaynes' maximum entropy principle 

^ . to derive prior information, we successively tackle the situations when only the channel delay spread is a priori known, then when 

■ it is not known. Exploitation of the time-frequency dimensions are also considered in this framework, to derive the optimal channel 
^ ' estimation associated to some performance measure under any state of knowledge. Simulations corroborate the optimality claim 
Q , and always prove as good or better in performance than classical estimators. 

^ . I. Introduction 

, Modern high rate wireless communication systems, such as lEEE-Wimax [1] or 3GPP-Long Term Evolution (LTE) [2], usu- 
' ally come along with large bandwidths. In multipath fading channels, this entails high frequency selectivity, which theoretically 
is beneficial for it provides system diversity. But in practice, this constitutes a strong challenge for equalization. The orthogonal 
I— H frequency division multiplexing (OFDM) modulation [3], considered as the scheme for most future wireless systems, allows 
c/2 for simplified equalization through pilot sequences scattered in the time-frequency grid and possibly over the space dimension 
when multiple antennas are used. 

The challenge in channel estimation with a limited number of pilots lies in the optimal way to exploit all the information 
the receiver is provided with. Classical methods consider that only data received from pilot positions are informative. As a 
consequence, between pilot positions, the estimated channel must be reconstructed using interpolation techniques that would 
prove robust (whatever one means by robustness) in simulations [4]. Then it appeared that a Bayesian minimum mean square 
, error (MMSE) [5] can be derived when not only the pilot sequences but also the channel covariance matrix are known. 
' This solution coincides with the linear MMSE (LMMSE) estimator and thus provides optimal performance when the state of 
' , knowledge on the system is limited to those pilot data and to the channel covariance matrix [6]. However, when the channel 
7—1 covariance matrix is unknown, then again, only ad-hoc techniques were derived to cope with the lack of knowledge. The definite 
OO choice of a prior correlation matrix for the channel is one of the classical approach (identity matrix or exponentially decaying 
matrix for instance) [7]. But all those approaches are only justified by good performance arising in selected simulations and 
J> do not provide any proof as for their overall performance. 

In the following work, we tackle channel estimation for OFDM as a problem of inductive reasoning based on the available 
prior information and the received pilots. Especially, to recover missing information, we extensively use the maximum entropy 

■ principle, introduced by Shannon [8], extended by Jaynes [9] and accurately proven by Shore and Johnson [10] to be the 
desirable mathematical tool to cope with lack of information. Some of the aforementioned classical results shall be found 
anew and proven optimal in our information theoretic framework, while new results will be provided which show to perform 
better than classical approaches. The remainder of this paper is structured as follows: In Section HIl we introduce the channel 
pilot-aided OFDM system, then in Section |lll] we caiTy out the Bayesian channel estimation study based on different levels of 
knowledge. Simulations are then proposed in Section |IV] and a thorough discussion on the results, limitations and extendability 
of our scheme is handled in Section |V] Then we draw our conclusions in Section |Vl] 

Notations: In the following, boldface lower case symbols represent vectors, capital boldface characters denote matrices (Ijv 
is the N X N identity matrix). The transposition operation is denoted (•)^. The Hermitian transpose is denoted {■)^. The 
operator diag(x) turns the vector x into a diagonal matrix. The symbol det(X) is the determinant of matrix X. The symbol 
E[ ] denotes expectation. The Kronecker delta function is denoted dx that equals 1 if x = and equals otherwise. 

II. System Model 

We consider here a single cell OFDM system with N subcarriers. The cyclic prefix (CP) length is Ncp samples. In the 
time-frequency OFDM symbol grid, pilots are found in the symbol positions indexed by the function 0t(n) G {0, 1} which 
equals 1 if a pilot symbol is present at subcarrier n and otherwise; the subscript index t denotes the OFDM symbol time. This 
is depicted in Figure [T] Both data and pilots are gathered at time i in a frequency-domain vector Sj S with pilot entries of 
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Fig. 1. Time-frequency OFDM grid witli pilot positions enlianced 



amplitude |sf.fcp = 1, for all t, k such that (/)t(fc) = 1. They are then sent through a channel of frequency response hf e 
with entries of mean power E[|ft,t^fc|^] = 1, and background noise rit £ with entries of mean power E[|ni^/j|-^] = a^. The 
time-domain representation of ht is denoted Ut £ with L the channel length, i.e. the number of time-domain samples 
which suffer inter symbol interference. The OFDM frequency signal yt G received at time t at the receiver is 

yt = diag(ht)st + tit (1) 

This work aims at optimally estimating the vector hf for some performance measure defined hereafter. The estimate shall 
be denoted ht G C^. 

Different states of knowledge at the receiver are considered in the subsequent work, 

• the channel length L is either known or unknown, 

• at discrete time t, the pilots at time {t — k) for fc G {1, . . . , K} as well as the channel time-correlation, are either known 
or unknown. 

On top of those parameters, classical system parameters are supposed to be known (some were explicitly already used), 

• signal mean power 

• noise mean power 

• channel mean power. 

In each of the subsequent derivations, the exact quantity of knowledge will be clearly stated, since it is essential to the 
inductive inference we will perform on the channel h. 

III. Channel Estimation 

A. The channel length is known 

We first consider the simple scenario in which the channel power delay profile, i.e. the diagonal elements of the time-domain 
channel covariance matrix, is unknown but the channel length L is known. Only pilot sequences received at discrete time t 
are known to the receiver This is, we assume that the receiver is not able to register either past received signals, nor past 
estimates of the channel. This hypothesis will be relieved in subsequent considerations. The amount of prior information, i.e. 
noise power, signal power and L, is denoted /. The amount of information that can be inferred about some entity E from the 
prior information / will be denoted {E\I). For ease of reading, we will remove the index t in the notations when unnecessary. 
We will also, Vfc G {1, . . . , N}, denote /i'^ = yk/sk = ht + Uk/sk and h' — {h[, . . . , h'^^Y . 

The only knowledge on the additional noise vector in the system is the mean power of its entries. The maximum entropy 
principle [13] requires then that the noise process be assigned a Gaussian independent and identically distributed (i.i.d.) density: 
n ^ C7^(0, o^Im)- From the channel model ([T]i, equivalently, the multipath channel of length L is only known to be of unit 



mean power. Again, the maximum entropy principle demands v ^ (^^^(O, -^II)- Since 1/ is a Gaussian vector with i.i.d. entries, 
h, its discrete Fourier transform, is a correlated Gaussian vector. 



h - C?^(0, Q) 



with, for any couple (n,m) G {1, . . . , N}'^, 
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To derive the optimal channel estimator, a decision regarding the targeted error function to be minimized has to be made. 
Following most of previous contributions in this respect, we propose to take h as the estimator that minimizes the mean 
quadratic estimation error (MMSE estimator), given the received signal y. This is [5] 
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in which the limiting process is taken over a set of invertible matrices Q (which tends to Q that is by definition of rank 
L < N), and P is a projection matrix over the set of pilot frequency carriers (P^ = (5i_j(50(i)). 

Let us note at this point that, from the received data y, we only use the symbols indexed at pilot positions in equation dHJ, 
hence the introduction of the projectors P. This seems to go against our optimality claim. Indeed, one might object that data 
outside the pilot positions somehow carry information about the channel and should be taken into account. In the same manner, 
we could also claim that potential interferers are not correctly dealt with. Still, our problem is correctly formulated since / 
does not carry any information about informative data apart from pilots, nor does it even suggest the presence of interferers. 
This epistemological discussion is further debated in Section IVl 

The product of the exponential terms in ^ can be written (by expansion and identification) 



-h^^Q-^h - ^(h - h')^P^P(h - h') = -(h - k)^^M(h -k)-C 



with 



M =Q-i + 4rPHp 
k = JjM-ipHph' 

C = 4h'^P^^Ph' - ki^Mk 



(10) 



(11) 



This allows to isolate the dumb variable h in the integrals and leads then to compute the first order moment of a multivariate 
Gaussian distribution. 



h = Jim 



h-e-(h-k)"M(h^k)rfh 
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lim k 

Q^Q 



(12) 
(13) 



in which k depends on Q through M. 



Noting that M^i = (Q-^ + ^P^P)-i = (Iat + ^QP"P)"iQ, h is then 

h= lim (ct^I^ + QpHp)-lQpHph' 

Q^Q 

= (a^Ijv + QP^P)^QP"Ph' 



(14) 
(15) 



This is exactly the well-known LMMSE solution [5]. However, this result is not yet another demonstration of LMMSE as it 
is classically derived. We used here the maximum entropy principle to observe that, at the end, the Umited knowledge on the 
channel length L mathematically gives the same estimation as when one "assumes" an a priori covariance matrix Q. Therefore 
the intuitive classical solution is the correct estimate in the sense of maximum entropy [13]. 

B. Unknown channel length 

If L is only known to be in an interval {Lmin, • • • , Lmax}, the maximum entropy principle assigns a uniform prior distribution 
for L; otherwise one would add non desirable implicit information to the current state of knowledge. The channel MMSE 
estimator is then given by 



h = E [h|y] 



(EL^(h|i)P(i))P(y|h) 
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.g-hHQ£lh . g-^(h-h')HPHP(h-h')^Jj 



(16) 
(17) 

(18) 



Lrnax} and Qfc are taken in a set of invertible 



where Qk is the channel covariance matrix for a channel length k G {L 
matrices in the neighborhood of Qfc. 

Using the same transformations as in (fTOl l and the fact that the numerator and denominator constants in (fTSl l do not simplify 
any longer, we end up with 

t\ — iim — 



L=L„ 



det(MWQL)-ie-C^'"'k(i) 
det(M(-E')Qi) 



(19) 



in which we updated our previous notations to incorporate the dependence on L, 
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= ^h'"pHph' - X 
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h'^ ((Iat + ^QiP"P)-i 
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The determinant det(M'^^)) can be further developed as 



1 ^ 



det(M(^)) = det(Ijv + ^QlP"P) • det(QL) 



which entails 



det(M(-^)QL) =det(lAr 



(21) 



(22) 



No inversion of Qfc matrices is then necessary so that the Umiting process is now straightforward 



E 



det ({In + ^QlP^P)"' 



-C(^), (L) 



(23) 



in which k(^) and C^-^) are the limits of k'^^ and respectively, 




= h'"((I^. + ^QiPHp)-l)^£;:Ph' 

Since C^^^ comprises the quadratic term h' P'^Ph', the MMSE estimation of h is not linear in h', therefore the LMMSE 
estimate in the scenario when L is unknown does not coincide with the MMSE estimate. We also note that formula ( |23T l is 
no more than a weighted function of the individual LMMSE estimates for different hypothetical values of L. The weighting 
coefficients allow to enhance the estimates that rather fit the correct L hypothesis and to discard those estimates that do not 
concord with the h' observation. 

C. Using time correlation 

When the channel coherence time, defined as the typical duration for which the channel realizations are correlated [11], is 
of the same order or larger than a few OFDM symbols, then past (and future) received data carry important information on 
the present channel. This information must be taken into account. 

Classically, channel time correlation is described through Jakes' model [12]. For a Doppler spread fd (proportional to the 
vehicular speed), the correlation figure is modeled as 

¥.[yt+T,pV*tJ = \-M'iirfdT) (25) 

in which p is one of the paths of the multipath channel u and Jq is the Bessel function of the first kind. 

This model actually makes two assumptions that, under the proper information setting, can be turned into the output of a 
maximum entropy process. Those assumptions [18] are 

• the signal scatterers are uncorrected in the sense that two rays arriving at different angles to the receiver face independent 
attenuation properties. Under no knowledge on the environmental scatterers, this has to be the logical assumption. 

• the angles of arrival (i.e. the angle between the antenna body and the incoming wave) are uniformly distributed. Again, 
this is what the maximum entropy principle would state if no particular knowledge on the positions of the scatterers, 
transmitter and receiver is a priori given. 

For those reasons, Jakes' model is reasonable when no geometrical information on the channel is given. Practically speaking, 
it will be difficult for the receiver to be aware of the exact Doppler frequency fd- In the following theoretical derivations and 
in the forthcoming simulations, we shall consider that the receiver exactly knows the expected value of equation ( l25l l. It is of 
course possible, either to find estimates for 'E[vp^t+TVp J or to complete the subsequent study by integrating out the possible 
Doppler frequencies given a prior distribution for fd- 

Let us consider the simple scenario in which only the present and last past pilot symbols are considered by the terminal. 
Those correspond to two time instants ti and t2, respectively. We also consider first that L is known. For notational simplicity, 
we shall denote hfc — hf^,. The MMSE estimator for h2 under this state of knowledge is then 

h2=E[h2mhy (26) 

P(Kh-2|h2)P(h2) 

p(h2)F(h;,|h;h2)F(h;|h2) 

' p(h;h^) ' ^ ^ 

H.^^ mhD ^''^ 

P(h2)P(h^|h2) /j^^ P(hi|h2hi)P(hi|h2)dhi 

..'^ mhD '''' 

P(h2)P(h^|h2)/j,^P(h;|hi)P(hi|h2)dhi 

K.'^ mhD '''' 

in which equations (|29] l and ( l3Tl i are verified since hi and h.2 do not bring any additional information to (h2|h2) and (h'i|hi) 
respectively. 

At this point, we recognize that, apart from the new term P(hi|h2), all the probabilities to be derived here have already 
been produced in the previous sections. Now, our knowledge on (hi|h2) is limited to equation (l25T l. Burg's theorem [17] states 
then that the maximum entropy distribution for {vi\v2) is an i-multivariate Gaussian distribution of mean A1/2 and variance 



-^(1 — A^)Il with A = Jo(27r/rfT). Therefore, thanks to the same Unearity argument as above, the distribution of (hi|h2) is 
given by 



P(hi|h2) 



1 



lim 

TT^ det(*) 



-(hi-Ah2)^^*"Hhi-Ah2) 



with 



*(r) = (i-A2)Q 

Consider first the inner integral in equation dSTT i. Similarly to above, we can express 



P(hi|h2)P(h'i|hi) = Jim 



-(hi-ki)HMi(hi-ki)-Ci 



T^Ah+N^2M, det(*) 

with 

ki = Mr'(A*-ih2 + JjP^Pih'i) 

- (Iat + ^*pHPi)-i(Ah2 + ^*PS:'Pi) 
Ci = A2h^*-ih2 + ^h'i"PS^Pih; - kJ'Miki 

= A2hH*-lh2 + ^h'l^pHPih'i - (Ah2 + ^*pHPih'i)H 



and Ml is the number of pilot positions in the first pilot sequence. 
Now, the integration of the part dependent on hi gives 
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which leads to 



/ P(hi|h2)P(h'i|hi)dhi = 



det(Ijv + j^^pHPi)-! 

^Mifj2Mi ' 



(36) 
(37) 

(38) 



with, like previously, $ and Ci the respective limits of $ and Ci in the limiting process Q ^ Q. 

We now need to consider the outer integral, that we shall similarly develop (not forgetting Ci that depends on h2) as 

1 



^M2+N^2M2 det(Q) Jh 



-(h2-k2)"M2(h2-k2)-C2^Jj 



(39) 
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Co = 



Q-1 + A2*-i - A2 + ^*pHPi* 

Q"' ((1 + T^)^N - jK^ilN + ^*P^Pl)-l + ^QPs^P 
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X (^P2"P2h'2 + {In + ^P^'Pi*)-i^PS^Pih'i) 

^h^'^pHPsh^ - k2^M2k2 + ^h'i"pHPih'i - h'l^ 



1 _ 

Q 



(Iat + ^*P^Pi 



i^Pl'Pihi 



with M2 the number of pilots in the second pilot sequence. 

In the expression of C2, it is readily seen that expanding k2M2k2 leads to invert M2. The Q^^ factor cancels then out 
(from the development of k2). Then M2 and M2^^ cancel out as well (in the development of M2k2). Therefore, no problem 
of matrix inversion is found in those expressions. We can then take the limit to finally have 



= li^ dct(M,)-Met(MO-i . r . f det(M,)-Met(MO-i \ 

Q^Q7rA^2+Mi^2(M2+Mi)det(Q)det(*) y7r*'^2+Mi^2(M2+Mi) det(Q) det(*) J 

= lim kz (41) 
Q^Q 

(1 + y^)Ia^ - t^(Ia^ + ^QP^Pi)-^ + ^QP2P2) ' Q 

\P^P2K + {In + i^PrPiQ)-^4PiPiK) (42) 



(43) 



This formula stands only when the channel length L is known. Then, with the same notations as in previous sections, if the 
channel length were only known to belong to an interval {Lmin, . . . , L^ax}, then 

det(AL)"Met(BL)"'e-^^'^M V dct(Ai)-i dct(BL)~'e-^^*^'k^^^ (44) 




with 




= (l + t4^) Ia' - i^N + ^QP^Pi) (45) 
= ^h^^pHPah^ - k^M^k, + ^h'l^Pj'Pih'i - h'l^^ + i-^QpHp^)"' ^P^'PiK 



The final formulas ( |42] ) and ( |45] l are interesting in the sense that they do not directly carry any intuitive properties. Indeed, 
if we were to find an ad-hoc technique that is to ponder the relative importance of our prior information on h.2, of the pilot 
data h2 and of the past (or future) pilot data h'l, we would suggest a linear combination of those constraints. Our result is not 
linear in those constraints. However it carries the expected intuition in the limits, 

« when A = 0, then the past and present channels are completely uncorrected so that no information carried by the past 
pilots should be of any use. This is what is observed since then, equation ( l42b reduces to LMMSE solution ( fTSl l. 

• when A ^ 1, then 



1 „ — u — 1 „ l-l — \ „ / 1 . . I 1 



QpHPi + — QPHP2 Q ^P2^P2h^ + -^P'lPiK (46) 



which is again the same equation as ( l42b but now the past and present pilots h'j^ and can be compiled into a single 
pilot sequence h2 with the projector R2 R2 = Pi Pi + P2 P2, 

[a, + ^QR^R2^ Q (^^R^Rah^'^ (47) 

Note also that ( |42] | is linear in the variables h!^ and h!^, so that the final MMSE solution when L is known is also the 
LMMSE solution. 

D. Time-frequency Channel Estimation 

Now, instead of merely estimating channels at times when pilot sequences are found, we can extend our scheme to estimate 
channels at any time position. For this, we shall in the following consider a channel hi2 that we want to estimate, given the 
knowledge of pilot signals h'j^ and found at positions of the respective hi and h2 channels. We are also aware of Ai and 
A2, the respective time-correlation coefficients between the couples (hi,hi2) and (h2,hi2). 

We assume for brevity here that the channel length L is known (this will avoid the heavy computation of some coefficients). 

Using the same derivations as in the previous sections, the MMSE estimation for hi2 is given by 



^(hl2) • (/h,m|h2)P(h2|hi2)dh2) • (/j^^P(h;|hi)F(hi|hi2)dh, 



dhi2 (48) 



We do not provide the complete derivation, which is identical in spirit as all the previous derivations to finally obtain, 

X f Ai(I„ + iz_SQpHpj)-i_LpHpji,; + A2(Ia. + i^QPSP2)-'4jP5P2hO (49) 

which generalizes equation ( l42b . 

This is further generalized for a given number K of pilot signals sequences h'^, fc £ {1, . . . , K^, (sent through channel h^) 
and a channel h (with inverse Fourier transform v) which satisfies 

V(*, A:) e {1, . . . , L} X {1, . . . , K}, Eh,tz.*,+,.] = ^ (50) 



The maximum entropy principle in this situation gives the MMSE estimator for h. 
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'QpHp,.)-i4P^Pfch'J (51) 
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E. Unknown correlation factor A 

The reader might object at this point that the prior knowledge either of the vehicular speed or of the mean correlation factor 
A might only be accessible through yet another estimation process. Moreover A expresses as an expectation so that possibly 
some time is required to have an accurate estimation. This of course goes against our idea of fast channel estimation. 

In the same trend as we did previously with the possibly unknown parameter L, we can equally integer out the parameter 
A from our formulas. In general this might be an uneasy task, since then the estimated channel h would read. 



hP(hm,...,h'^)dh (52) 

= IJyj^ hP(hm, ...Mk. A)P(A)dA^ dh (53) 

in which P(A) is our prior knowledge on the parameter A. But, going further in the computation, this last integration is rather 
involved. It could then be well approximated by the finite summation, 

hF(h|h'i,...,h'^.,A)F(A)^ dh (54) 
A)dh (55) 




= P(A) / hP(h|h'i,...,h'^, 



As shall be illustrated in Section |IV] it actually makes almost no difference to assume that A is or is not known. This 
suggests that our estimators are able themselves to cope with the lack of information concerning A, just by inductive inference 
on A from the data h'j^, . . . , h'^. 

F. Non-homogeneous SNR 

In all previous sections, we considered an homogeneous noise power over the frequency bandwidth. This situation is 
usually far from real (and often far from the actual knowledge the receiver has on the noise correlation matrix). Typically, in the 
presence of strong interferers working on selective frequencies, the noise correlation matrix Cn — Filnn^] is not proportional 
to an identity matrix (and has even no particular tendency to be diagonal). 

If the information about the noise is updated to consider Cn, then all the previous equations are to be updated by replacing 
all terms ^P'^P by the corrected terms P^^C^P. 




IV. Simulations and Results 

In this section, we propose simulations of the previously derived formulas. In order to produce insightful plots, we consider 
a short discrete Fourier transform of size = 32. In a first simulation, we assume the channel length L = 3 is unknown to 
the receiver that only knows L G {1, . . . , 6}. The pilot symbols are separated by 6 subcarriers as in the 3GPP-LTE standard 
[2] and depicted in Figure [T] The SNR is set to SNR = 20 dB. The results are proposed in Figure |2] that compares the novel 
Bayesian MMSE solution to the classical LMMSE solution assuming maximum channel length L ~ L^ax = 6. It is observed 
that our solution has better results than LMMSE in this particular case, due to the prior error in L that does not occur in this 
novel scheme. 

A comparison between the performance of our Bayesian MMSE estimator when the channel length L is known or is 
unknown is then proposed in Figure [3] We take here a channel length L = 5 that is known to belong to the range {1, . . . , 10}. 
Interestingly, the performance decay due to the absence of knowledge in L is not large. If we consider the previous situation, 
i.e. L = 3 for a range {1, . . . , 6}, we even visually observe no performance difference. 

This raises a very interesting feature of the inductive reasoning framework since, by trying to infer on the channel knowledge 
given the received signal (h|y), the Bayesian framework also encompasses inference on L. Indeed, 



PiL) 



P(y|h)P(h|i)dh (57) 



^(y) A 

in which the integral is the same has in Section UlI-AI and P{L) is the uniform prior distribution for L. 

The inductive inference on every hypothesis L = 1, . . . , L„iax can then be compared thanks to the evidence function defined 
by Jaynes [13] which reads 

The results, for different SNR are proposed in Figure |5] However, since the evidence for any hypothesis on L is not large. 



we instead draw the odds function 

n(T.\v\ = I 



(59) 



which is ten to the power of the evidence function. 

We observe, as predicted, that evidence for i = 5 is raised higher than the other hypothesis and that this behaviour is 
especially noticeable when the SNR is high. This produces an updated posterior distribution P{L\y) that almost discards 
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Fig. 3. Channel estimated energy - AT = 32, L = 5, Lmax = 10, SNR = 20 dB 
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Fig. 4. Mean square error of channel estimation when L is unknown - SNR = 20 dB, A'^ = 128, L = 6 



all wrong hypothesis. Therefore, at high SNR, the impact of the hypothesis L 5 in the final equations is negligible. This 
automatic inference on the channel length is a direct consequence of the proposed MMSE formula that would not be possible 
through classical orthodox probability approaches. 

We want in the following to observe first the effects of using time correlation properties before dealing with performance 
figures. In Figures |7] and [8] we propose the situation of two pilot sequences corresponding to two correlated channels with 
correlation factor A — 0.99. We estimate here one of the two channels either using both pilot sequences. It is observed that the 
high correlation A allows to perform the estimation of long channels. Indeed, the high density of pilots in the time-frequency 
grid allows to better approach the genuine channel. This is shown in Figure |7] on a random channel, for N = 32, L — 6, 
SNR = 20 dB, which clearly illustrates that using both pilot signals increases the accuracy of the estimator. 

Another good effect is observed when the noise power is strong (or equivalently when the SNR is low). Then, using 
twice as many samples for channel estimation leads effectively to twice as accurate channel estimation provided that both 




channels are strongly correlated. This is demonstrated in Figure [8] in which we used SNR = 10 dB. 

The corresponding numerical performance are proposed in Figures |9] and [TO] respectively. In the former, the channel length 
is L = 5 for a DFT-size iV = 64. A single train of pilots is then enough to estimate the channel. However, as previously 
discussed, in the presence of highly time-correlated channels, the estimation noise can be reduced using both trains of pilots. 
Essentially, it is observed here that, since the number of pilots for both sequences is almost equal, the channel estimation based 
on both trains of pilots is realized on twice as many pilot positions. Therefore, when the time-correlation A is high, for low 
SNR, up to 3 dB gain can be observed. 

As for long channels, it is clear in Figure [TO] that high correlation between channels in time is demanded so to perform 
accurate channel estimates. Indeed, while not using time-adjacent channels prove devastating (over 6 dB of mean square error), 
high time-correlations allows to significantly reduce the mean square error. 

Finally, in Figure (TT] we propose to compare the performance of a channel estimator using two pilot sequences correlated 
in time, provided that the process knows or does not know the exact time-correlation coefficient A, set here to A = 93%. 
The channel length L = 15 is known and the DFT size is kept to iV = 64. Surprisingly, a poor prior knowledge on A does 
not strongly impact the final mean quadratic error of the estimation. At least, when the correlation is known to be more than 
1/2, the performance in terms of mean square error is similar to that when A is perfectly known. This suggests, as already 
mentioned, that the Bayesian machinery is able to efficiently infer on A whatever the SNR level. The reader must be wary 
that this last sentence does not imply at all that the time-correlation parameter does not intervene in the system performance. 
What we stated above is just that prior knowledge about this parameter is not mandatory since the posterior distribution for A 
(given h'l, h.2) is already very peaky around the correct value for A. 

V. Discussion 

In the following, we discuss the advantages of the general framework which encompasses the previous channel estimators. 
Some limitations, concerning complexity mainly, are also considered. Finally we discuss the potential drawbacks in using 
alternative channel estimation techniques. 

Those channel estimation algorithms were proposed on the sole basis of Jaynes' probability theory [13] that mainly 
encompasses the Bayesian rule and the maximum entropy principl^H Those rules can be applied to larger problems than 
the mere scope of channel estimation. For instance, optimal Bayesian signal detection is proposed in [20]. Also, maximum 
entropy channel modeling are derived in [21]. All those studies lead to the general idea of cognitive receivers. Indeed, a few 
years after the introduction of the concept of cognitive radios [22], some attempts have been proposed to clearly define the 
fundamental basis of a cognitive radio [23] but still no correct definition has been derived. This has the rather unpleasant 
consequence to see many contributions on cognitive techniques, each based on very different fundamental assumptions. In our 

'the Bayesian rule has actually been proven to be a particular case of the generalized ME principle [14] 
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Fig. 6. Mean evidence gap of channel length recovery - SNR = 20 dB, TV = 128, L = 6, ^tested G {li ■ ■ • i 9} 



minds, probability theory as extended logic is an interesting candidate in the information theoretic definition of a cognitive 
receiver By the latter, we mean a receiving terminal which, given any amount of information / is able to optimally infer on 
any system parameter This way the device would be capable of learning. Having done that, it will then have to take decisions, 
i.e. what information to send back with which accuracy, what additional information to request etc., so to maximize some 
utility function. However, this second requirement for a cognitive receiver goes beyond the mere scope of Jaynes' probability 
theory and involves decision theoretic discussions as well as epistemological considerations such as the brilliant inquiry theory 
from Cox [15] and Knuth [16]. 

This being said, the reader will then raise the objection that our current work obviously did not consider all the information 
available to the receiver Indeed, as previously mentioned, if pilots were designed to help channel estimation then informative 
datil also carry information on the channel they face. We could mentally envision a joint channel estimator and signal decoder 
device that would help recover a better estimate on the channel and on the sent signal. But of course, at this point, we 
would face strong implementation difficulties as well probably as involved mathematical problems. Still, through this study 
we suggest that in general ad-hoc solutions are not the direction to head for when one wishes to have both good performance 
and non-involved algorithms. If one wishes to reduce the computational load of an algorithm, it is preferable to develop the 
full Bayesian solution and only then start to simplify the mathematical expressions. For instance, in our previous examples, 
when the channel length L is unknown but the complexity of summing up hundreds of potentials candidates for L is too 
large, a solution that only considers ten sampled values for L could be envisioned. This is the correct way to keep a grasp on 
what simplification we performed; in classical techniques, from the very beginning, strong assumptions and approximations are 
made which effects are often invisible in terms of performance (and might only be observable through extensive simulations). 

One of those classical techniques in the channel estimation realm is proposed, among others, in [19]. The basic idea is very 
insightful since it consists in estimating present channels based on the knowledge of the estimation done on a past channel and 
the time-correlation coefficient that link both channels. Such solutions might seem interesting in the fact that, by recursion, the 
previous estimate "carries the information on all past pilots signals" but this is actually very deluding. Indeed, this estimate 
does not actually contain the whole information on the previous pilot signals. It merely consists in some post-filtering result 
of those pilots signals. We might then raise a few objections to using previous estimates, 

• if we were to "select" informatiorH, then we would better want to consider some of the past (and possibly future) pilot 
signals than previous estimates 

• if only the previous estimates are available, then it would seem dishonest not to mention to the Bayesian machinery that 
those actually are estimates of a channel. This suggests that, when using those previous estimates, it must be somehow 
mentioned in the equations that they are MMSE or least-square [5] estimates of the channel, that originated from pilots 

^by informative, we suggest here data dedicated to communication purposes and not synchronization purposes 
'which is dishonest according to Jaynes' theory fundamental desiderata [13] 




Fig. 7. Channel estimated energy witli time correlation - 99% congelation, A'^ = 32, L = 6, SNR = 20 dB 




Fig. 8. Channel estimated energy with time correlation - 99% congelation, TV = 32, L = 3, SNR = 10 dB 



sequences present (if this is also known) at particular positions and so on and so forth. This would allow for the Bayesian 
process to provide inductive inference on the actual pilot signals and then make the similar derivations as those we have 
proposed along this paper. Any other usage of previous estimates could not be claimed optimal. 

VI. Conclusion 

In this work, a novel view on channel estimation for OFDM systems is proposed. Optimal formulas for the mean square error 
criterion are derived that re-demonstrate known classical solutions while new formulas are also proposed for scenarios based 
on different levels of knowledge. The whole work can be summarized as a unique novel framework that allows to integrate 
any information the receiver is aware of to perform optimal channel inference based on this knowledge. Also, some hints on 
the long-term introduction of foundations for cognitive receivers are suggested that would encompass the present work. 
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Fig. 9. Estimation Mean Square Error - Short cliannel - N = 64, L = 5 
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Fig. 10. Estimation Mean Square Error - Long cliannel - N = 64, L = 15 
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